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Abstract 

The  proposed  method  is  an  extension  of  an  existing  Kalman 
Filter  (KF)  ensemble  method.  While  the  original  method  has 
shown  great  promise  in  the  earlier  PHM  2008  Data  Chal¬ 
lenge,  the  main  limitation  of  the  KF  ensemble  is  that  it  is 
only  applicable  to  linear  models.  In  prognostics,  degrada¬ 
tion  of  mechanical  systems  is  typically  non-linear  in  nature, 
therefore  limiting  the  applications  of  KF  ensemble  in  this 
area.  To  circumvent  this  problem,  this  paper  propose  to  ap¬ 
proximate  non-linear  functions  with  piecewise  linear  func¬ 
tions.  When  estimating  the  RUL,  the  Switching  Kalman  Fil¬ 
ter  (SKF)  is  able  to  choose  the  most  probable  degradation 
mode  and  thus  make  better  predictions.  The  implementation 
of  the  proposed  SKF  ensemble  method  is  illustrated  by  imple¬ 
menting  on  NASA’s  C-MAPSS  Dataset  as  well  as  the  PHM 
2008  Data  Challenge  Dataset.  The  results  show  the  effective¬ 
ness  of  the  SKF  in  detecting  the  switching  point  between  var¬ 
ious  degradation  modes  as  well  as  the  improved  accuracy  of 
the  SKF  ensemble  method  compared  to  other  available  meth¬ 
ods  in  literature. 

1.  Introduction 

In  the  recent  years,  Condition  Based  Maintenance  (CBM) 
has  been  garnering  more  attention  as  it  allows  industries  to 
better  plan  logistics  as  well  as  save  cost  by  replacing  parts 
only  when  needed.  Prognostics  being  one  of  the  key  en¬ 
ablers  of  CBM  has  therefore  also  gained  more  interest  in  both 
academia  and  industry.  The  key  notion  of  prognostics,  albeit 
not  the  only  one,  is  to  determine  the  time  remaining  before  a 
likely  failure.  This  value  is  commonly  termed  as  the  Remain¬ 
ing  Useful  Life  (RUL)  of  the  system. 
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In  this  paper,  a  novel  prediction  algorithm  is  presented  which 
is  applicable  to  non  linear  degradation  models.  The  algo¬ 
rithm  assumes  that  degradation  model  can  be  described  by 
a  number  of  piece- wise  linear  functions.  With  each  of  these 
linear  functions  describing  a  linear  model,  the  most  suitable 
model  to  describe  the  degradation  at  any  point  in  time  is  cho¬ 
sen  based  on  the  Switching  Kalman  Filter  (SKF)  algorithm. 
The  remainder  of  this  paper  is  structured  as  follows,  Section  2 
first  introduces  the  datasets  used  to  evaluate  the  effectiveness 
of  the  algorithm.  Section  3  follows  by  presenting  a  simple 
single  neural  network  approach  to  evaluate  the  difficulty  of 
the  problem.  Finally  in  Section  4  the  SKF  ensemble  approach 
is  presented  and  evaluated. 

2.  Dataset 

In  this  paper  a  total  of  two  datasets  were  used.  The  datasets 
used  are  namely  the  PHM  2008  Data  Challenge  Dataset  as 
well  as  the  NASA  C-MAPSS  Dataset  (Saxena  &  Goebel,  2008), 
the  C-MAPSS  dataset  is  further  divided  into  4  sub-datasets 
as  shown  in  Table  1.  Both  datasets  contain  simulated  data 
produced  using  a  model  based  simulation  program  (named 
Commercial  Modular  Aero-Propulsion  System  Simulation, 
C-MAPSS)  developed  by  NASA  (Saxena,  Goebel,  Simon,  & 
Eklund,  2008). 

Table  1.  Dataset  details  (Simulated  from  C-MAPSS) 


Dataset 

FD001 

C-M; 

FD002 

\PSS 

FD003 

FD004 

PHM 

2008 

Train 

Trajectories 

100 

260 

100 

248 

218 

Test 

Trajectories 

100 

259 

100 

248 

218 

Conditions 

1 

6 

1 

6 

6 

Fault 

Modes 

1 

1 

2 

2 

2 
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The  data  is  arranged  in  an  n-by-26  matrix  where  n  corre¬ 
sponds  to  the  number  of  data  points  in  each  dataset.  Each  row 
is  a  snapshot  of  data  taken  during  a  single  operational  cycle 
and  each  column  represents  a  different  variable.  Included  in 
the  data  are  three  operational  settings  that  have  a  substantial 
effect  on  engine  performance. 

Each  trajectory  within  the  train  and  test  trajectories  is  as¬ 
sumed  to  the  be  life-cycle  of  an  engine.  While  each  engine 
is  simulated  with  different  initial  conditions,  these  conditions 
are  considered  to  be  of  normal  conditions  (no  faults).  For 
each  engine  trajectory  within  the  training  sets,  the  last  data 
entry  corresponds  to  the  moment  the  engine  is  declared  un¬ 
healthy.  On  the  other  hand  the  test  sets  terminate  at  some 
time  prior  to  failure  and  the  aim  is  to  predict  the  number  of 
Remaining  Useful  Life  (RUL)  of  each  engine  of  the  test  set. 

For  each  of  the  C-MAPSS  dataset  the  actual  RUL  value  of 
the  test  trajectories  were  made  available  to  the  public  while 
the  actual  RUL  of  the  test  dataset  of  PHM  2008  is  not  avail¬ 
able.  However,  users  can  submit  their  results  to  the  NASA 
website  to  obtain  a  score  limited  to  one  submission  per  day. 
Due  to  this  constrain,  most  of  the  analysis  done  in  this  pa¬ 
per  will  be  based  on  the  NASA  C-MAPSS  dataset  instead  of 
the  PHM  2008  dataset.  The  PHM  2008  dataset  would  instead 
be  used  for  comparison  against  other  algorithms  proposed  in 
literature. 


methods  in  literature. 

2.1.2.  RMSE 

In  addition  to  the  scoring  function,  the  Root  Mean  Square 
Error  (RMSE)  of  the  estimated  RULs  is  also  used  as  a  per¬ 
formance  measure.  RMSE  is  chosen  as  it  gives  equal  weight 
to  both  early  and  late  predictions.  Using  RMSE  in  conjunc¬ 
tion  with  the  scoring  function  would  prevent  the  user  from 
favouring  an  algorithm  which  artificially  lowers  the  score  by 
underestimating  but  resulting  in  higher  RMSE.  Furthermore, 
various  literature  working  on  this  dataset  uses  RMSE  to  eval¬ 
uate  their  algorithms,  inclusion  of  RMSE  would  therefore  al¬ 
low  the  author  to  compare  results  with  those  available  in  lit¬ 
erature. 


1  N 

i= 1 

A  comparative  plot  between  the  two  evaluation  metrics  is 
shown  in  Figure  1 .  It  can  be  observed  that  at  lower  absolute 
error  values  the  scoring  function  results  in  lower  values  than 
the  RMSE.  The  relative  characteristics  of  the  two  evaluation 
metrics  will  be  useful  during  the  discussion  of  results  in  the 
latter  part  of  this  paper. 


2.1.  Evaluation  Metrics 

2.1.1.  Scoring  Function 

The  scoring  function  used  in  this  paper  is  identical  to  that 
used  in  PHM  2008  Data  Challenge.  This  scoring  function  is 
illustrated  in  Eq.  (1),  where  s  is  the  computed  score,  N  is  the 
number  of  engines,  and  d  =  RUL-RUL  (Estimated  RUL-  True 
RUL). 


N 

s  =  J2Si’Si 

i= 1 


e  is  —  1  for  di  <  0 
eio  —  1  for  di  >  0 


(1) 


The  characteristic  of  this  scoring  function  is  that  it  favours 
early  predictions  more  than  late  predictions.  This  is  in  line 
with  the  risk  adverse  attitude  in  aerospace  industries.  How¬ 
ever  there  are  several  drawbacks  with  this  function.  The  most 
significant  drawback  being  a  single  outlier  would  dominate 
the  overall  score,  thus  masking  the  true  accuracy  of  the  algo¬ 
rithm.  Another  drawback  is  the  lack  of  consideration  of  the 
prognostic  horizon  of  the  algorithm.  The  prognostic  horizon 
assess  the  time  before  failure  which  the  algorithm  is  able  to 
accurately  estimate  the  RUL  value  within  a  certain  confidence 
level.  Finally  this  scoring  function  favours  algorithms  which 
artificially  lowers  the  score  by  underestimating  the  RUL.  De¬ 
spite  all  these  shortcomings,  this  scoring  function  is  still  used 
in  this  paper  in  order  to  provide  a  level  comparison  with  other 


Comparison  of  scoring  function  against  RMSE  for  a  single  engine  (N=1 ) 


Error  value  (d.) 


Figure  1 .  Comparison  of  evaluation  metric  values  for  differ¬ 
ent  error  values 


2.2.  Data  Preparation 

2.2.1.  Operating  Conditions 

Several  literature  (Wang  et  al.,  2008;  Peel,  2008;  Heimes, 
2008),  have  shown  that  by  plotting  the  operational  setting 
values,  the  data  points  are  clustered  into  six  different  dis¬ 
tinct  clusters.  This  observation  is  only  applicable  for  datasets 
with  different  operational  conditions,  data  points  from  FD001 
and  FD003  are  all  clustered  at  a  single  point  instead.  These 
clusters  are  assumed  to  correspond  to  the  six  different  oper- 
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Cycles  Cycles 


Figure  2.  Sensor  values  (a)  before  and  (b)  after  normalization 

ational  conditions.  It  is  therefore  possible  include  the  opera¬ 
tional  condition  history  as  a  feature.  This  is  done  by  adding 
6  columns  of  data  representing  the  number  of  cycles  spent  in 
their  respective  operational  condition  since  the  beginning  of 
the  series  (Peel,  2008). 

2.2.2.  Data  Normalization 

Due  to  the  6  operating  conditions,  each  of  these  operating 
conditions  results  in  disparate  sensor  values  as  shown  in  Fig¬ 
ure  2.  Therefore  prior  to  any  testing  and  training,  it  is  imper¬ 
ative  to  normalize  the  data  points  to  be  within  the  range  of 
[-1,1]  using  Eq.  (3).  As  normalization  was  carried  out  within 
the  range  of  values  for  each  sensor  and  each  operating  con¬ 
dition,  this  will  ensure  equal  contribution  from  all  features 
across  all  operating  conditions  (Peel,  2008).  Alternatively,  it 
is  also  possible  to  incorporate  operating  condition  informa¬ 
tion  within  the  data  to  take  into  consideration  various  operat¬ 
ing  conditions 

Norm(x^)  =  2  ^,/T ,/)  -  F  Vc,  /  (3) 

Xmax  xrriin 

where  c  represents  the  operating  conditions  and  /  represents 
each  of  the  original  21  sensors. 

3.  Single  Neural  Network  Approach 

3.1.  Method  Description 

The  aim  of  this  section  is  two-fold.  Firstly  as  a  prior  to  exper¬ 
imenting  with  other  methods,  the  complexity  of  the  problem 
was  tested  using  a  single  Multi-Layer  Perceptron  (MLP)  Net¬ 
work  to  achieve  a  baseline  performance.  This  baseline  per¬ 
formance  then  used  for  comparing  the  accuracy  of  the  pro¬ 
posed  method.  Secondly,  the  method  is  used  to  evaluate  the 
performance  of  the  two  different  RUL  functions  presented  in 
section  3.2  below. 

3.2.  Arbitrary  RUL  Function 

In  its  crudest  form  prognostic  algorithms  are  similar  to  re¬ 
gression  problems.  However,  unlike  typical  regression  prob¬ 
lems,  an  inherent  challenge  for  data  driven  prognostic  prob¬ 
lems  is  determining  the  desired  output  values  for  each  input 


data  point.  This  is  because  in  real  world  applications,  it  is 
impossible  to  accurately  determine  the  health  of  the  system 
at  each  time  step  without  an  accurate  physics  based  model.  A 
sensible  solution  would  be  to  simply  assign  the  desired  output 
as  the  actual  time  left  before  functional  failure  (Peel,  2008; 
Baraldi,  Mangili,  &  Zio,  2012).  This  approach  however  in¬ 
advertently  implies  that  the  health  of  the  system  degrades  lin¬ 
early  with  usage  (Figure  3a). 

An  alternative  approach  is  to  derive  the  desired  output  val¬ 
ues  based  on  a  suitable  degradation  model.  For  this  data-set 
(Heimes,  2008)  has  proposed  a  piece-wise  linear  degradation 
model  which  limits  the  maximum  value  of  the  RUL  function 
(Figure  3b).  The  maximum  value  was  chosen  based  on  the 
observations  of  the  data  and  its  numerical  value  is  different 
for  each  data-set.  For  the  sake  of  simplicity,  the  former  will  be 
addressed  as  ’linear  function’  while  the  latter  will  be  known 
as  the  ’kink  function’  in  the  remainder  of  the  paper. 


Figure  3.  Comparison  of  degradation  models,  a)  Linear 
Degradation  model,  b)  Piece- wise  Linear  Degradation  Model 


Each  of  these  approaches  has  their  own  advantages.  The  lat¬ 
ter  case  is  more  likely  to  prevent  the  neural  network  from 
overestimating  the  RUL,  it  is  also  a  more  logical  model  as  the 
degradation  of  the  system  typically  only  starts  after  a  certain 
degree  of  usage.  On  the  other  hand,  the  former  case  follows 
the  definition  of  RUL  in  the  strictest  sense  which  defined  as 
the  time  to  failure.  Therefore  the  plot  of  time  left  of  a  system 
against  the  time  passed  naturally  results  in  a  the  linear  func¬ 
tion  as  shown  in  Figure  3a.  However  it  should  be  noted  that 
in  cases  where  knowledge  of  a  suitable  degradation  model  is 
unavailable,  the  linear  model  is  the  most  natural  choice  to  use. 

3.3.  Results 

For  each  sub-dataset  within  the  C-MAPSS  dataset,  two  MLPs 
were  individually  trained  using  the  linear  and  kink  RUL  func¬ 
tions  as  desired  outputs.  The  MLPs  were  then  tested  using  the 
corresponding  test  sub-datasets  and  evaluated  using  Eq.  (1) 
and  Eq.  (2).  Due  to  the  inherent  noise  in  the  data,  in  order 
to  capture  the  variance  of  each  MLP,  the  whole  training  and 
testing  process  was  repeated  for  a  total  of  10  trials.  The  re¬ 
sults  from  these  trials  are  expressed  in  the  form  of  box  plots 
shown  in  Figure  4  &  5. 
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FD001  x  105  FD002  x  105  FD003  x  106  FD004 


Figure  4.  Scores  of  MLP  trained  with  linear  and  kink  RUL 
functions. 


Figure  4  shows  that  using  the  linear  RUL  function  resulted 
in  comparatively  much  higher  variance  in  scores.  However 
considering  the  RMSE  plots  (Figure  5)  the  variance  of  RMSE 
values  within  each  dataset  is  relatively  similar.  Therefore  the 
higher  variance  in  scores  is  due  to  the  nature  of  the  scoring 
function.  The  exponential  term  in  the  scoring  function  could 
cause  large  deviations  in  the  score  due  to  a  single  inaccurate 
estimation.  The  variance  of  the  RMSE  values  for  both  MLPs 
could  be  attributed  to  the  inability  of  the  single  MLP  to  handle 
noisy  input  data. 
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Figure  5.  RMSE  of  MLP  trained  with  different  RUL  func¬ 
tions. 

More  importantly,  all  datasets  show  significant  improvements 
in  both  RMSE  and  scores  when  the  kink  RUL  function  is 
used.  The  lower  RMSE  values  obtained  by  using  the  kink 
RUL  function  (Figure  5)  is  evidence  that  their  respective  lower 
scores  in  Figure  4  is  due  to  more  accurate  predictions  instead 
of  inducing  underestimation  of  RUL.  These  results  agree  with 
Heimes  (2008)  that  the  kink  RUL  function  is  a  much  more 
suitable  degradation  model  for  these  datasets. 


4.  Switching  Kalman  Filter  (SKF)  Ensemble 

4.1.  Method  Description 

In  order  to  improve  the  prognostic  accuracy  of  a  single  MLP 
implemented  in  section  3.3,  ensemble  methods  are  explored 
to  develop  a  more  accurate  and  robust  prognostic  method.  En¬ 
semble  methods  are  generally  used  to  combine  multiple  weak 
classifiers  into  a  single  strong  classifier.  It  has  been  found  that 
ensembles  would  have  higher  accuracy  and  generalizability  if 
each  ensemble  members  are  accurate  and  make  errors  on  dif¬ 
ferent  parts  of  the  input  space  (Maclin  &  Opitz,  2011).  There 
are  generally  two  main  steps  in  creating  an  ensemble:  The 
first  step  is  to  create  individual  ensemble  members,  and  the 
second  step  to  combine  the  output  of  the  ensemble  members. 

In  order  for  the  ensemble  to  generate  better  results,  the  gen¬ 
eralization  of  the  ensemble  must  be  improved.  This  can  be 
obtained  by  having  diversity  in  the  ensemble  members.  The 
most  commonly  used  method  to  create  ensemble  members 
include  input  data  sampling  techniques  such  as  Bagging  and 
Boosting  (Zhou,  2012;  Re  &  Valentini,  2011).  In  this  paper, 
networks  with  different  network  topology  are  used  to  create 
ensemble  members  as  this  method  has  less  variables  to  tune 
as  compared  to  boosting  and  bagging. 

Combination  of  output  from  ensemble  members  is  usually 
taken  as  a  weighted  mean  or  median  of  the  ensemble  member 
outputs  (Zhou,  2012).  The  weights  are  usually  determined 
based  on  the  training  error  of  each  ensemble  member  (Krogh 
&  Vedelsby,  1995).  Peel  (2008)  proposed  an  alternative  com¬ 
bination  method  which  uses  a  Kalman  filter  to  combine  the 
output  of  several  neural  networks.  This  method  has  shown 
great  promise  by  wining  the  IEEE  Gold  for  PHM  2008  Data 
Challenge.  In  his  work,  both  the  training  function  for  the 
neural  networks  and  the  model  used  in  the  Kalman  filter  as¬ 
sumes  a  linear  degradation  function  thus  limiting  its  applica¬ 
tion  to  linear  cases.  This  section  extends  this  method  by  using 
a  Switching  Kalman  Filter  (SKF)  for  piecewise  linear  appli¬ 
cations.  Thus  allowing  implementation  of  a  similar  ensemble 
for  other  degradation  patterns. 

4.2.  Ensemble  Members 

In  this  paper  MLPs  with  different  number  of  hidden  neu¬ 
rons  are  used  as  ensemble  members.  The  number  of  hidden 
neurons  were  randomly  picked  from  a  uniform  distribution 
of  integers  between  5  to  25  inclusive.  The  maximum  num¬ 
ber  of  hidden  neurons  was  limited  to  prevent  over  fitting  on 
the  training  set,  thus  ensuring  generalization  on  unseen  data 
points.  A  total  of  4  ensemble  members  were  generated  per 
ensemble. 

4.3.  Aggregation  based  on  Kalman  Filter  (KF) 

KFs  and  its  variants  have  been  widely  used  for  machine  learn¬ 
ing  applications.  These  applications  range  from  simple  state 
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prediction  (Borguet  &  Leonard,  2009)  to  training  of  neural 
network  weights  using  the  Extended  Kalman  Filter  (EKF) 
(Singhal  &  Wu,  1989;  Puskorius  &  Feldkamp,  1991).  In  this 
paper,  the  traditional  KF  and  its  variant  the  SKF  will  be  used. 

4.3.1.  Kalman  Filter 

The  more  commonly  used  application  of  the  KF  is  as  a  for¬ 
ward  pass  state  estimator.  The  filter  predicts  the  hidden  states 
for  the  next  time  step  given  the  history  of  estimated  states  and 
observing  noisy  outputs.  The  predicted  states  are  considered 
optimal  as  the  filter  aims  to  minimize  the  uncertainties  in  the 
estimate  (AL-Mathami,  Everson,  &  Fieldsend,  2012).  Prior 
to  using  the  KF,  the  system  must  be  modeled  as  a  linear  sys¬ 
tem  as  shown 


xt  =  Axt-1  +  wt  (4) 

zt  =  Hxt  +  Vt 

where  xt  is  the  state  vector  at  time  t,  A  is  the  transition  ma¬ 
trix,  zt  represents  the  output  observations,  H  is  the  observa¬ 
tion  matrix,  wt  and  vt  are  the  process  noise  and  observation 
noise  respectively.  Based  on  the  model  a  recursive  process  is 
then  carried  out  whereby  the  prediction  step  is  carried  out  by 

xt  =  Axt_  1 

Pt  =  APt_xAT  +  Q  > 

where  Pt  is  the  state  covariance  matrix  and  Q  is  the  process 
error  covariance  matrix.  The  KF  then  updates  the  estimate 
based  on  the  new  observations.  The  updating  step  is  then 
carried  out  by  the  following  equations 

Kt  =  PtHT[HPtHT +  R}-1 

xt=xt+  Kt  [zt  -  Hxt]  (6) 

Pt  =  [I-  KtH]Pt 

where  R  is  the  observation  error  covariance  matrix  and  Kt  is 
the  Kalman  gain  at  time  t.  For  illustrative  purposes,  the  state 
xt  is  chosen  as 


xt  = 


RULt 
A  RULt 


,  A RULt  =  RULt  -  RULt-!  (7) 


It  is  therefore  straight  forward  to  express  the  kink  RUL  func¬ 
tion  as  a  piecewise  linear  function  with  their  respective  linear 
KF  model  expressed  as 


Ac 


1  0 
0  1 


Ai  = 


1  1 
0  1 


(8) 


where  Ac  is  the  model  for  the  initial  constant  RUL  phase  and 


Ai  is  the  model  for  the  linear  degradation  phase,  assuming  a 
gradient  of  —1  for  the  linear  degradation  phase.  In  addition, 
the  outputs  from  individual  neural  networks  are  taken  to  be 
the  observations,  therefore  the  observation  vector  zt  and  H 
are  set  as 


"  RULi  ' 

1 

0 

Zt  = 

,H  = 

_  RULn  _ 

1 

0 

where  RU Ln  is  the  output  of  the  nth  neural  network  in  the 
ensemble.  Further  details  of  modeling  the  ensemble  outputs 
is  covered  in  Peel  (2008)  and  Baraldi  et  al.  (2012). 


4.3.2.  Kalman  Smoother 

In  contrast  to  the  KF,  which  estimates  the  optimal  state  given 
observations  up  to  time  t,  the  Kalman  smoother  aims  to  esti¬ 
mate  the  optimal  state  at  time  t  given  the  observations  from  1 
to  T,  where  T  represents  the  total  length  of  data  observations 
(AL-Mathami  et  al.,  2012).  The  Kalman  smoother  is  an  anal¬ 
ogous  backwards  recursive  process  which  estimates  the  states 
from  the  end  of  the  observation  data.  Therefore  combining 
both  forward  and  backward  pass  gives  the  optimal  estimated 
state  given  the  whole  observation  data. 

At  the  last  time  step  the  variables  x  and  P  are  initialized  as 


xt  —  xt 
Pt  =  Pt 


(10) 


where  x  is  the  smoothed  state  and  P  is  the  smoothed  covari¬ 
ance.  The  smoothed  states  can  then  be  calculated  based  on  the 
following  recursive  equations  where  t  decreases  from  T  —  1 
to  1  (AL-Mathami  et  al.,  2012). 


Jt  =  ( PtAT)Pt-+\ 

Xt=Xj  +  (Jt(xJ+1-AxJ))T  (11) 

Pt  =  Pt  +  Jt(Pt+ 1  -  Pt+i)Jj 

4.4.  Switching  Kalman  Filter  (SKF) 

Eq.  (8)  in  the  earlier  section  has  shown  that  the  Kink  degrada¬ 
tion  function  can  be  modeled  using  two  linear  systems.  The 
outputs  of  the  ensemble  members  would  therefore  need  to  be 
combined  using  the  suitable  KF  model.  This  problem  is  fur¬ 
ther  compounded  by  the  fact  that  the  switching  point  between 
the  two  models  differ  for  every  engine.  Thus  making  it  diffi¬ 
cult  to  pre-define  a  rule  to  switch  between  the  two  models.  To 
circumvent  this  problem  a  SKF  (Murphy,  1998;  AL-Mathami 
et  al.,  2012)  is  implemented  to  autonomously  determine  the 
switching  point. 

In  this  application,  SKF  predicts  the  most  probable  hidden 
discrete  model  given  the  observations  and  the  models.  The 
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Figure  6.  Directed  graphical  probabilistic  model  of  SKF 
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graphical  probabilistic  model  of  the  SKF  for  aggregating  en¬ 
semble  methods  is  shown  in  Figure  6.  Based  on  the  figure, 
the  SKF  determines  the  sequence  of  models  which  would 
most  likely  result  in  the  series  of  observations.  Similar  to  the 
KF,  the  SKF  computes  the  posterior  probability  of  the  model 
given  the  observations  in  two  passes.  The  forward  pass  calcu¬ 
lates  P(St  =  j\xt,xi:t-i)  while  the  backwards  pass  calcu¬ 
lates  P(St  =  j\xp.r)-  An  illustrative  example  of  the  forward 
pass  calculation  is  shown  below 

For  each  £,  j: 


P(St  =  j\xt,Xl:t-l) 


p(xt  \St=j,xl:t—l)P(St  =j  \  xl:t  —  l) 

P(xt\xl:t-l) 


=  ±LtV)ViZ(i,j)P{St-i  =  i\x  v.t-i) 


(12) 


where 

c  =  P(xt \x1:t-i)  =  YljLt(j)'ZiZ(i,  j)P(St-i  =  i\x1:t-i) 
Lf  =  P(xt\St  —  j,  x\:t— i)  ~  N{xt,  AjXt-i ,  Qj) 

=  P(St  =j\St  =  i,x 

(13) 


It  should  be  noted  that  Z(i,  j)  is  a  predefined  transition  matrix 
which  contains  the  probability  of  transition  from  one  model 
to  another.  Thus,  based  on  this  calculated  probability,  the 
most  probable  model  can  be  chosen.  The  backwards  pass  can 
be  calculated  in  a  similar  manner  and  therefore  will  not  be 
repeated  here.  For  more  details  on  the  SKF,  readers  can  refer 
to  Murphy  (1998)  and  AL-Mathami  et  al.  (2012) 


In  this  implementation,  the  output  of  the  trained  ensemble 
members  are  taken  to  be  the  observations  and  switching  mod¬ 
els  corresponds  to  the  two  KF  models  expressed  in  Eq.  (8). 
The  most  probable  sequence  of  models  is  first  determined  by 
the  SKF,  the  corresponding  KF  models  can  then  be  applied 
to  aggregate  the  outputs  of  individual  ensemble  members  to 
obtain  the  estimated  RUL  value.  Figure  7  shows  an  example 
of  the  SKF  algorithm  estimating  the  degradation  of  an  engine 
from  the  training  set.  It  can  be  observed  that  the  predicted 
switching  point  between  the  two  models  by  the  SKF  corre- 


Figure  7.  Example  of  SKF  Ensemble  output  on  a  training 
engine 


sponds  well  with  the  predefined  kink  location  in  the  RUL 
function.  It  should  also  be  noted  that  the  initial  conditions 
of  the  Kalman  filter  is  re-initialized  for  each  engine. 

4.5.  Results 

In  this  section  the  performance  of  the  SKF  ensemble  is  illus¬ 
trated  and  compared  with  the  original  KF  ensemble  method. 
The  KF  ensemble  was  recreated  to  the  best  of  knowledge 
based  on  the  details  given  in  Peel  (2008).  Furthermore,  re¬ 
sults  obtained  from  Section  3.3  are  also  included  for  com¬ 
parison  purposes  to  highlight  the  effectiveness  of  ensemble 
methods.  Similar  to  previous  sections,  all  the  experiments 
were  repeated  for  a  total  of  10  trials,  the  results  obtained  from 
these  trials  are  then  expressed  in  the  form  of  a  boxplot. 

4.5.1.  C-MAPSS  Dataset 

Figure  8  illustrates  the  scores  of  all  methods  described  in 
this  paper  for  all  four  sub-datasets  within  C-MAPSS.  It  is 
observed  that  both  linear  MLP  or  KF  ensemble  displayed 
high  mean  and  large  variance  of  scores.  In  addition  all  four 
methods  achieved  RMSE  values  of  the  same  order  (Figure  9). 
Based  on  these  observations,  coupled  with  the  characteristics 
of  each  evaluation  metric  (Figure  1),  it  can  be  implied  that 
the  high  scores  are  caused  by  certain  outliers  in  predicting 
the  RUL.  This  phenomenon  could  probably  be  attributed  to 
the  use  of  the  linear  RUL  function  which  might  lead  to  over¬ 
estimating  of  the  RUL,  thus  resulting  in  significantly  higher 
scores. 

In  addition,  the  high  scores  exhibited  by  the  Linear  MLP  and 
KF  ensemble  resulted  in  a  badly  scaled  boxplot  making  it  dif¬ 
ficult  to  illustrate  and  compare  the  relative  performance  of  the 
remaining  algorithms.  Therefore  more  in  depth  comparison 
of  the  four  methods  will  focus  mainly  on  the  RMSE  values 
instead  (Figure  9). 
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Based  on  Figure  9,  it  can  also  be  deduced  that  the  SKF  en¬ 
semble  outperforms  that  KF  ensemble  significantly.  The  SKF 
ensemble  achieved  much  lower  RMSE  values  which  is  most 
likely  attributed  to  the  use  of  the  kink  RUL  function  to  model 
the  degradation  of  the  system.  These  results  reaffirm  the  hy¬ 
pothesis  arrived  in  Section  3.3  that  the  kink  RUL  function  is 
a  much  more  accurate  model  for  this  dataset. 
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Figure  8.  Scores  of  various  algorithms  for  all  C-MAPSS 
Datasets. 
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Figure  9.  RMSE  of  various  algorithms  for  all  C-MAPSS 
Datasets. 

As  expected,  both  KF  and  SKF  ensemble  methods  resulted 
in  significantly  lower  RMSE  variance  compared  to  their  re¬ 
spective  linear  and  kink  MLPs.  This  can  be  attributed  to  the 
ability  of  ensembles  to  aggregate  the  outputs  of  individual 
ensemble  members  thus  resulting  in  a  lower  variance.  In  ad¬ 
dition,  the  use  of  KF  helps  to  filter  out  noise  from  the  output 
of  the  ensemble  (Figure  7)  thus  resulting  in  increased  robust¬ 
ness  against  inherent  noise  in  the  data.  The  same  observa¬ 
tions  can  be  seen  in  Figure  10  which  shows  in  greater  detail 
the  comparison  box  plot  between  the  SKF  ensemble  and  the 
single  MLP  trained  with  a  kink  training  function.  In  addition 
to  obtaining  lower  variance  in  RMSE  values,  the  SKF  ensem¬ 


ble  also  exhibited  lower  mean  RMSE  values.  Thus  showing 
that  the  SKF  ensemble  outperforms  the  original  MLP  in  both 
accuracy  and  variance  in  predictions. 


Figure  10.  RMSE  of  MLP  with  Kink  training  function  and 
SKF  for  all  C-MAPSS  Datasets. 


Comparing  the  scores  between  the  Kink  MLP  and  the  SKF 
ensemble  (Figure  1 1)  for  all  datasets  showed  that  both  meth¬ 
ods  achieved  scores  within  the  similar  range.  However  the 
SKF  slightly  out  performs  the  Kink  MLP  by  exhibiting  less 
variance  in  scores  throughout  the  10  trials.  This  phenomenon 
can  be  similarly  be  attributed  the  ability  of  ensemble  to  be 
more  robust  to  noise  as  mentioned  in  the  earlier  paragraph. 
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Figure  1 1 .  Scores  of  MLP  with  Kink  training  function  and 
SKF  for  all  C-MAPSS  Datasets. 


4.5.2.  PHM  2008  Dataset 

In  this  section,  the  algorithms  were  tested  on  the  test  dataset 
for  PHM  2008.  The  estimated  RULs  of  218  engines  within 
the  dataset  were  then  uploaded  to  the  NASA  Data  Repository 
website  and  a  single  score  was  then  returned  by  the  website. 
The  results  were  also  compared  with  available  literature  that 
provided  suitable  scores  for  comparison. 

Based  on  the  results  it  can  be  seen  that  the  SKF  ensemble 
produces  significantly  lower  scores  and  outperforms  the  other 
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Table  2.  Scores  for  various  algorithms  on  PHM  2008  test 
dataset 


Methods 

Scores 

Single  MLP  (Linear) 

118338 

Single  MLP  (Kink) 

6103.46 

KF  Ensemble 

5590.03 

SKF  Ensemble 

2922.33 

Gibbs  Filtering  (Le  Son,  Fouladirad,  &  Barros,  2012) 

4170 

methods.  However  as  mentioned  in  Section  2,  submission  of 
estimated  RULs  are  limited  to  once  a  day.  Thus  the  scores 
shown  in  Table  3  are  from  a  single  submission.  Therefore 
these  scores  are  also  subject  to  variance  as  seen  in  earlier  sec¬ 
tions. 

5.  Conclusion 

In  this  paper  we  have  demonstrated  the  effectiveness  of  a 
SKF  ensemble  for  systems  with  non-linear  degradation  pat¬ 
terns.  In  addition,  the  performance  of  the  SKF  ensmeble 
on  NASA’s  C-MAPSS  dataset  has  shown  improvement  over 
other  methods  in  literature.  Implementation  on  these  simu¬ 
lated  datasets  simply  serve  as  a  proof-of-concept  for  the  pro¬ 
posed  method  at  this  stage.  This  method  has  also  wide  ap¬ 
plications  to  other  prognostic  situations  where  the  system  in¬ 
volved  has  more  than  one  degradation  mode.  An  example 
would  be  where  the  degradation  pattern  of  the  system  changes 
due  to  external  factors  such  as  operating  conditions  or  over¬ 
haul  maintenance.  In  view  of  the  range  of  possible  appli¬ 
cations,  the  authors  have  plans  to  implement  the  proposed 
method  on  a  real-world  dataset  and  validate  its  effectiveness. 
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